Fixed Effects – YIGIT ASIK, Data Scientist

Entity & Time Fixed Effects

I mentioned fixed effects on difference in differences post but I wanted to elaborate a bit further on the topic and show where it’s useful. I’m diving right into an example and explain along the way.

import pandas as pd
import numpy as np

from matplotlib import pyplot as plt
import seaborn as sns

import statsmodels.api as sm
import statsmodels.formula.api as smf
from linearmodels.panel import PanelOLS

import warnings

warnings.filterwarnings('ignore')

pd.set_option('display.float_format', lambda x: '%.3f' % x)

df = pd.read_csv('Grunfeld.csv', index_col=0)
df.head()

	invest	value	capital	firm	year
1	317.600	3078.500	2.800	General Motors	1935
2	391.800	4661.700	52.600	General Motors	1936
3	410.600	5387.100	156.900	General Motors	1937
4	257.700	2792.200	209.200	General Motors	1938
5	330.800	4313.200	203.400	General Motors	1939

I have data from 11 firms: Their capital, market value, investment for each year between 1935 to 1954. This is a panel data, since I have multiple observations for each firm, on different time periods.

Let’s say that I am interested in the relationship between market value and investment. For simplicity, if we had data on a single year we could estimate the following for each firm i:

\(\displaystyle invest_i = \beta_0 + \beta_1 value_i + \beta_2 capital_i + \epsilon_i\)

However, there are things that we miss with this approach:

There might be firm-level variables that we would like to have in the model. These are assumed to be constant for a firm.

The idea is pretty neat actually. Think of having two years of data. Let’s say 1935 and 1936:

\(\displaystyle invest_{i \, 1936} = \beta_0 + \beta_{1}value_{i \, 1936} + \beta_{2}capital_{i \, 1936} + \beta_{3}\alpha_i + \epsilon_{i \, 1936}\)

\(\displaystyle invest_{i \, 1935} = \beta_0 + \beta_{1}value_{i \, 1935} + \beta_{2}capital_{i \, 1935} + \beta_{3}\alpha_i + \epsilon_{i \, 1935}\)

Now, if you take the difference what happens is those \(\beta_{3}\alpha_i\) terms get cancelled. What you’re left with is:

\(\displaystyle invest_{i \, 1936} - invest_{i \, 1935} = \beta_{1}(value_{i\,1936} - value_{i\,1935}) + \beta_{2}(capital_{i\,1936} - capital_{i\,1935}) + (\epsilon_{i \, 1936} - \epsilon_{i \, 1935})\)

I believe this is a very intuitive example. Accounting for unobserved firm-level characteristics is just adding firm as dummy in the regression!

The other thing that I haven’t mentioed above is the effects that are constant within a time period but may differ between years. These are shared between firms. Think of things like inflation, market trends etc.

Well, I’ve got the idea. Let’s add that as a dummy as well?

lm = smf.ols(
    'invest ~ value + capital + C(firm) + C(year)',
    data=df
)
res = lm.fit()

res.summary()

OLS Regression Results
Dep. Variable:	invest	R-squared:	0.953
Model:	OLS	Adj. R-squared:	0.945
Method:	Least Squares	F-statistic:	122.1
Date:	Sun, 12 Oct 2025	Prob (F-statistic):	5.20e-108
Time:	01:14:10	Log-Likelihood:	-1153.0
No. Observations:	220	AIC:	2370.
Df Residuals:	188	BIC:	2479.
Df Model:	31
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	18.0876	18.656	0.970	0.334	-18.715	54.890
C(firm)[T.Atlantic Refining]	-112.5008	17.752	-6.337	0.000	-147.520	-77.482
C(firm)[T.Chrysler]	-13.5993	17.540	-0.775	0.439	-48.199	21.001
C(firm)[T.Diamond Match]	16.4928	15.692	1.051	0.295	-14.462	47.448
C(firm)[T.General Electric]	-241.0850	28.000	-8.610	0.000	-296.319	-185.851
C(firm)[T.General Motors]	-101.7696	55.177	-1.844	0.067	-210.615	7.075
C(firm)[T.Goodyear]	-77.9628	16.435	-4.744	0.000	-110.383	-45.543
C(firm)[T.IBM]	-6.4573	16.271	-0.397	0.692	-38.554	25.640
C(firm)[T.US Steel]	100.5492	28.438	3.536	0.001	44.450	156.648
C(firm)[T.Union Oil]	-56.7936	16.403	-3.462	0.001	-89.151	-24.436
C(firm)[T.Westinghouse]	-41.7165	17.483	-2.386	0.018	-76.204	-7.229
C(year)[T.1936]	-16.9592	21.518	-0.788	0.432	-59.407	25.488
C(year)[T.1937]	-36.3756	22.364	-1.627	0.106	-80.492	7.741
C(year)[T.1938]	-35.6237	21.162	-1.683	0.094	-77.370	6.122
C(year)[T.1939]	-63.0994	21.505	-2.934	0.004	-105.522	-20.677
C(year)[T.1940]	-39.8248	21.626	-1.842	0.067	-82.486	2.836
C(year)[T.1941]	-16.4878	21.529	-0.766	0.445	-58.957	25.982
C(year)[T.1942]	-17.9993	21.275	-0.846	0.399	-59.967	23.968
C(year)[T.1943]	-37.7724	21.415	-1.764	0.079	-80.016	4.471
C(year)[T.1944]	-38.3201	21.459	-1.786	0.076	-80.652	4.012
C(year)[T.1945]	-49.5395	21.687	-2.284	0.023	-92.322	-6.757
C(year)[T.1946]	-27.7544	21.866	-1.269	0.206	-70.888	15.379
C(year)[T.1947]	-34.8775	21.589	-1.616	0.108	-77.464	7.709
C(year)[T.1948]	-38.3307	21.734	-1.764	0.079	-81.204	4.542
C(year)[T.1949]	-65.2008	21.901	-2.977	0.003	-108.404	-21.998
C(year)[T.1950]	-67.3877	22.028	-3.059	0.003	-110.841	-23.935
C(year)[T.1951]	-54.8346	22.437	-2.444	0.015	-99.095	-10.574
C(year)[T.1952]	-56.4890	22.819	-2.475	0.014	-101.504	-11.474
C(year)[T.1953]	-58.5126	23.819	-2.457	0.015	-105.500	-11.525
C(year)[T.1954]	-81.7939	24.204	-3.379	0.001	-129.540	-34.047
value	0.1167	0.013	9.022	0.000	0.091	0.142
capital	0.3514	0.021	16.696	0.000	0.310	0.393

Omnibus:	32.466	Durbin-Watson:	0.988
Prob(Omnibus):	0.000	Jarque-Bera (JB):	180.276
Skew:	0.311	Prob(JB):	7.14e-40
Kurtosis:	7.391	Cond. No.	3.92e+04

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.92e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

You can fit the same with PanelOLS, like below, and get a cleaner table.

fe_model = PanelOLS.from_formula('invest ~ value + capital + EntityEffects + TimeEffects', data=df.set_index(['firm', 'year']))
fe_res = fe_model.fit()

fe_res.summary

PanelOLS Estimation Summary
Dep. Variable:	invest	R-squared:	0.7253
Estimator:	PanelOLS	R-squared (Between):	0.7637
No. Observations:	220	R-squared (Within):	0.7566
Date:	Sun, Oct 12 2025	R-squared (Overall):	0.7625
Time:	01:14:15	Log-likelihood	-1153.0
Cov. Estimator:	Unadjusted
		F-statistic:	248.15
Entities:	11	P-value	0.0000
Avg Obs:	20.000	Distribution:	F(2,188)
Min Obs:	20.000
Max Obs:	20.000	F-statistic (robust):	248.15
		P-value	0.0000
Time periods:	20	Distribution:	F(2,188)
Avg Obs:	11.000
Min Obs:	11.000
Max Obs:	11.000

Parameter Estimates
	Parameter	Std. Err.	T-stat	P-value	Lower CI	Upper CI
value	0.1167	0.0129	9.0219	0.0000	0.0912	0.1422
capital	0.3514	0.0210	16.696	0.0000	0.3099	0.3930

F-test for Poolability: 18.476
P-value: 0.0000
Distribution: F(29,188)

Included effects: Entity, Time

One more thing though, check covariance type on both tables (nonrobust, unadjusted). It means errors are assumed to be independent which might be violated here. Think about it, observations are grouped in the sense that they belong to same firm. So, they share some unobserved component. Hence, errors might be correlated within each firm (across year).

For the same reason, errors might be correlated within each year (e.g., firms are subject to same inflation).

So, we should allow residuals to be correlated within groups.

It’s possible to use clustered covariance type with statsmodels but it doesn’t allow it to be 2 dimensional. In other words, you either cluster by entity dimension (e.g., firm) or time dimension (e.g., year). PanelOLS, on the other hand, allows for two-way clustering.

fe_model = PanelOLS.from_formula('invest ~ value + capital + EntityEffects + TimeEffects', data=df.set_index(['firm', 'year']))
fe_res = fe_model.fit(cov_type='clustered', cluster_entity=True, cluster_time=True)

fe_res.summary

PanelOLS Estimation Summary
Dep. Variable:	invest	R-squared:	0.7253
Estimator:	PanelOLS	R-squared (Between):	0.7637
No. Observations:	220	R-squared (Within):	0.7566
Date:	Sun, Oct 12 2025	R-squared (Overall):	0.7625
Time:	01:15:17	Log-likelihood	-1153.0
Cov. Estimator:	Clustered
		F-statistic:	248.15
Entities:	11	P-value	0.0000
Avg Obs:	20.000	Distribution:	F(2,188)
Min Obs:	20.000
Max Obs:	20.000	F-statistic (robust):	84.060
		P-value	0.0000
Time periods:	20	Distribution:	F(2,188)
Avg Obs:	11.000
Min Obs:	11.000
Max Obs:	11.000

Parameter Estimates
	Parameter	Std. Err.	T-stat	P-value	Lower CI	Upper CI
value	0.1167	0.0117	10.015	0.0000	0.0937	0.1397
capital	0.3514	0.0447	7.8622	0.0000	0.2633	0.4396

F-test for Poolability: 18.476
P-value: 0.0000
Distribution: F(29,188)

Included effects: Entity, Time

I feel like this one is a very intuitive example but for more, you can check this.